Hessian Free Deep Learning
نویسنده
چکیده
Optimization techniques used in Machine Learning play an important role in the training of the Neural Network in regression and classification tasks. Predominantly, first order optimization methods such as Gradient Descent have been used in the training of Neural Networks, since second order methods, such as Newton’s method, are computationally infeasible. However, second order methods show much better convergence characteristics than first order methods, because they also take into account the curvature of the error space. Additionally, first order methods require a lot of tuning of the decrease parameter, which is application specific. They also have a tendency to get trapped in local optimum and exhibit slow convergence. Thus Newton’s method is absolutely essential to train networks with deep architectures. The reason for in-feasibility of Newton’s method is the computation of the Hessian matrix, which takes prohibitively long. Influential work by Pearlmutter [2] led to development of a method of using the Hessian without actually computing it. Recent work [1] has involved training of a deep network consisting of a number of Restricted Boltzmann Machine using Newton’s method without directly computing the Hessian matrix, in a form of “Hessian free” learning. The method had exhibited success on the MNIST handwriting recognition data set when used to train an Restricted Boltzmann Machine using Hinton’s [3] method, with a better quality solution for classification tasks. The proposed work for the CS229 project aims to improve upon the method of “Hessian Free” (HF) learning and apply it to different classification tasks. To do this, the Hessian free learning method will be implemented and results for the experiments using MNIST will be replicated. Through analysis, it is aimed to propose further modifications that will improve the method and also run it on different classification tasks.
منابع مشابه
Deep learning via Hessian-free optimization
We develop a 2nd-order optimization method based on the “Hessian-free” approach, and apply it to training deep auto-encoders. Without using pre-training, we obtain results superior to those reported by Hinton & Salakhutdinov (2006) on the same tasks they considered. Our method is practical, easy to use, scales nicely to very large datasets, and isn’t limited in applicability to autoencoders, or...
متن کاملImproved Preconditioner for Hessian Free Optimization
We investigate the use of Hessian Free optimization for learning deep autoencoders. One of the critical components in that algorithm is the choice of the preconditioner. We argue in this paper that the Jacobi preconditioner leads to faster optimization and we show how it can be accurately and efficiently estimated using a randomized algorithm.
متن کاملInvestigations on hessian-free optimization for cross-entropy training of deep neural networks
Context-dependent deep neural network HMMs have been shown to achieve recognition accuracy superior to Gaussian mixture models in a number of recent works. Typically, neural networks are optimized with stochastic gradient descent. On large datasets, stochastic gradient descent improves quickly during the beginning of the optimization. But since it does not make use of second order information, ...
متن کاملBlock-diagonal Hessian-free Optimization for Training Neural Networks
Second-order methods for neural network optimization have several advantages over methods based on first-order gradient descent, including better scaling to large mini-batch sizes and fewer updates needed for convergence. But they are rarely applied to deep learning in practice because of high computational cost and the need for model-dependent algorithmic variations. We introduce a variant of ...
متن کاملBlock-diagonal Hessian-free Optimization
Second-order methods for neural network optimization have several advantages over methods based on first-order gradient descent, including better scaling to large mini-batch sizes and fewer updates needed for convergence. But they are rarely applied to deep learning in practice because of high computational cost and the need for model-dependent algorithmic variations. We introduce a variant of ...
متن کامل